NPU Enabling and Usage

Table of Contents [Sticky]

Introduction

The Dragonwing 9075 EVK has a dedicated NPU that delivers up to 100 Dense TOPS of performance that runs 13Bn parameter models and generates 12 tokens per second.

On Yocto, the layer recipes-ml provides recipes for some Qualcomm AI runtime SDK components. On Ubuntu, the SDK is partially included to be able to run some sample applications and some GStreamer pipelines.

Specification	Value
NPU name	Qualcomm Hexagon^[1]
NPU architecture	Dual Hexagon Tensor Processors^[2]
Compute extensions	Vector and matrix extensions^[3]
Vector accelerator	Quad Qualcomm Hexagon Vector eXtensions (HVX)^[4]
Matrix accelerator	Dual Qualcomm Hexagon Matrix eXtensions (HMX) coprocessors^[4]
Integrated DSP	Qualcomm Hexagon DSP^[4]
Peak NPU performance, QCS9075-AC	Up to 50 dense TOPS^[1]
Peak NPU performance, QCS9075-AA	Up to 100 dense TOPS^[1]
Peak sparse-equivalent performance	Up to 200 equivalent sparse TOPS^[2]
INT8 AI performance	Up to 100 INT8 TOPS^[3]
Example generative AI workload	Llama 2 7B at up to 22 tokens/s^[1]
NPU software backend	QNN HTP backend / Hexagon Tensor Processor backend^[5]
Quantized network support	Quantized 8-bit and quantized 16-bit networks^[5]
Floating-point support	Float32 networks using float16 math on select Qualcomm SoCs^[5]
Operator / layer support source	QAIRT / QNN Supported Operations, HTP backend columns^[6]

The full QAIRT SDK supports the following operations:

Supported layers	Layer type	Datatype	Backend
Conv1d, Conv2d, Conv3d, DepthWiseConv1d, DepthWiseConv2d	Convolution	FP32 / FP16 / INT8 / INT16	CPU, HTP, HTP FP16, GPU, LPAI
FullyConnected, MatMul	Dense / matrix multiplication	FP32 / FP16 / INT8 / INT16	CPU, HTP, HTP FP16, GPU, LPAI
PoolAvg2d, PoolAvg3d, PoolMax2d, PoolMax3d, L2Pool2d	Pooling	FP32 / FP16 / INT8 / INT16	CPU, HTP, HTP FP16, GPU, LPAI
Relu, Prelu, Elu, Gelu, HardSwish, Sigmoid, Tanh, Softplus	Activation	FP32 / FP16 / INT8 / INT16	CPU, HTP, HTP FP16, GPU, LPAI
ElementWiseAdd, ElementWiseSubtract, ElementWiseMultiply, ElementWiseDivide, ElementWisePower, ElementWiseMaximum, ElementWiseMinimum, ElementWiseSquaredDifference	Element-wise arithmetic	FP32 / FP16 / INT8 / INT16	CPU, HTP, HTP FP16, GPU, LPAI
ElementWiseAbs, ElementWiseCeil, ElementWiseCos, ElementWiseExp, ElementWiseFloor, ElementWiseLog, ElementWiseNeg, ElementWiseRound, ElementWiseRsqrt, ElementWiseSin, ElementWiseSquareRoot	Element-wise unary	FP32 / FP16	CPU, HTP, HTP FP16, GPU, LPAI
ElementWiseEqual, ElementWiseNotEqual, ElementWiseGreater, ElementWiseGreaterEqual, ElementWiseLess, ElementWiseLessEqual	Element-wise comparison	FP32 / FP16 / INT8 / INT16	CPU, HTP, HTP FP16, GPU, LPAI
ElementWiseAnd, ElementWiseOr, ElementWiseXor, ElementWiseNot	Element-wise logical	Boolean / integer	CPU, HTP, HTP FP16, GPU, LPAI
Batchnorm, InstanceNorm, LayerNorm, GroupNorm, L2Norm, Lrn	Normalization	FP32 / FP16 / INT8 / INT16	CPU, HTP, HTP FP16, GPU, LPAI
ReduceMax, ReduceMean, ReduceMin, ReduceProd, ReduceSum, ReduceSumSquare	Reduction	FP32 / FP16 / INT8 / INT16	CPU, HTP, HTP FP16, GPU, LPAI
Softmax, LogSoftmax, MaskedSoftmax	Softmax	FP32 / FP16 / INT8 / INT16	CPU, HTP, HTP FP16, GPU, LPAI
Reshape, Squeeze, ExpandDims, Transpose, Permute, Pack, Unpack, Concat, Split, Slice	Tensor shape / layout	Input datatype	CPU, HTP, HTP FP16, GPU, LPAI
Pad, Tile, Gather, GatherElements, GatherNd, OneHot, NonZero	Tensor indexing / construction	FP32 / FP16 / integer / INT8 / INT16	CPU, HTP, HTP FP16, GPU, LPAI
Quantize, Dequantize, Cast, Convert	Datatype conversion	FP32 / FP16 / INT8 / INT16	CPU, HTP, HTP FP16, GPU, LPAI
Resize, CropAndResize, GridSample, ExtractPatches, DepthToSpace, SpaceToDepth, BatchToSpace	Vision / spatial transform	FP32 / FP16 / INT8 / INT16	CPU, HTP, HTP FP16, GPU, LPAI
Argmax, Argmin, TopK	Selection / ranking	FP32 / FP16 / integer / INT8 / INT16	CPU, HTP, HTP FP16, GPU, LPAI
Lstm, Gru	Recurrent	FP32 / FP16 / INT8 / INT16	CPU, HTP, HTP FP16, GPU, LPAI
NonMaxSuppression, MultiClassNms, CombinedNms, BoxWithNmsLimit, DetectionOutput, GenerateProposals, CollectRpnProposals, DistributeFpnProposals, BboxTransform	Detection / proposal generation	FP32 / FP16 / INT8 / INT16	CPU, HTP, HTP FP16, GPU, LPAI

GStreamer elements for NPU use

Gstreamer elements typically used for AI applications

Element	Type	Description
qtimlqnn	QNN inference	Qualcomm QNN-based inference element. This is the most direct GStreamer candidate for QNN/HTP/NPU execution; set the backend from the default CPU library to the HTP backend when validating NPU.
qtimltflite	TensorFlow Lite inference	Runs TFLite models. For NPU testing, use delegate=external with the QNN TFLite delegate and HTP backend options.
qtimlsnpe	SNPE inference	Runs SNPE/DLC models with delegate options for CPU, DSP, GPU, or AIP. Useful for legacy Qualcomm AI pipelines, but QNN/TFLite paths are more relevant for HTP/NPU validation.
qtimlvconverter	Video-to-tensor preprocessing	Converts video/x-raw frames into neural-network/tensors before inference. Used before qtimlqnn, qtimltflite, or qtimlsnpe in video AI pipelines.
qtimlaconverter	Audio-to-tensor preprocessing	Converts mono raw audio into neural-network/tensors. Supports raw, spectrogram, MFE, LMFE, and MFCC features for audio ML pipelines.
qtimlpostprocess	Generic ML postprocessor	Preferred post-processing element for converting inference tensors into video, text, or tensor outputs. Supports modules for detection, classification, segmentation, pose, OCR, depth, face, audio, and related AI tasks.
qtimldemux	Tensor demuxer	Splits batched neural-network/tensors into separate tensor streams. Useful after batched inference or when separating multiple tensor outputs.
qtibatch	Batch muxer	Batches buffers from multiple streams into one output buffer. Useful when testing batched or multi-stream inference.
qtimlmetaextractor	ML metadata extractor	Extracts ML metadata from video buffers into UTF-8 text buffers for logging, debugging, or publishing inference results.
qtimlmetaparser	ML metadata parser	Parses ML metadata from video or text buffers. Useful when converting inference metadata into a structured text representation such as JSON.
qtimetamux	Metadata muxer	Attaches text or optical-flow metadata as GstMeta to raw video/audio buffers, allowing inference results to travel with the media stream.
qtimetatransform	Metadata transform	Filters or transforms metadata attached to video buffers. Useful for ROI-based AI flows or smoothing label/ROI metadata.
qtiobjtracker	Object tracker	Tracks detected objects across frames after detection post-processing. Useful after object detection inference to maintain object IDs over time.

Running AI sample applications ^[7]

Qualcomm offers some AI Sample Applications for object detection and parallel inferencing from input sources such as a camera, a video file or an RTSP stream to stream on the Dragonwing IQ-9075 device. To run the application use the following workflow:

Download models and labels
Transfers the downloaded files to the device
Run AI sample applications

Download and transfer AI models and labels

The required models can be downloaded from Qualcomm AI Hub. This are the required files for some example applications:

Sample application	Models required
AI object detection	yolox_quantized.tflite
Parallel AI inference	yolox_quantized.tflite
	Inception-v3
	HRNetPose
	DeepLabV3-Plus-MobileNet
Multistream inference	yolox_quantized.tflite
Multistream inference	Inception-v3

To download with automated script, create working directory on board:

WORKING_DIR=~/AI_Examples
mkdir $WORKING_DIR
cd $WORKING_DIR

sudo apt install unzip

Get script:

curl -L -O https://raw.githubusercontent.com/quic/sample-apps-for-qualcomm-linux/refs/heads/main/qualcomm-linux/scripts/download_artifacts.sh

Give executable permission:

chmod +x download_artifacts.sh

Execute script:

sudo ./download_artifacts.sh

gst-ai-object-detection application

To setup, setup the configuration file created in /etc/configs/config_detection.json

sudo vim /etc/configs/config_detection.json

To run with video example as source, change the file as follows

{
  "file-path": "/etc/media/video.mp4",
  "ml-framework": "tflite",
  "yolo-model-type": "yolox",
  "model": "/etc/models/yolox_quantized.tflite",
  "labels": "/etc/labels/yolox.json",
  "threshold": 40,
  "runtime": "dsp",
  "output-type": "waylandsink"
}

To run with camera source:

{
  "camera": 0,
  "ml-framework": "tflite",
  "yolo-model-type": "yolox",
  "model": "/etc/models/yolox_quantized.tflite",
  "labels": "/etc/labels/yolox.json",
  "threshold": 40,
  "runtime": "dsp",
  "output-type": "waylandsink"

}

The following table lists and describes the fields in the config_detection.json file.

Field	Values/description
`ml-framework`	Use one of the following models: `snpe`: Qualcomm ® Neural Processing SDK `tflite`: LiteRT `qnn`: Qualcomm ® AI Engine direct
`yolo-model-type`	Runs the `yolov5`, `yolov8`, `yolox` and `yolonas` models, respectively. For more information about models and labels, see the Sample model and label files.
`runtime`	Use one of the following runtimes: `cpu` `gpu` `dsp`
Input source	Use one of the following input sources: `camera`: Primary camera `0` or secondary camera `1` `file-path`: Directory path of the video file `rtsp-ip-port`: Address of the RTSP stream in the `rtsp://<ip>:<port>/<stream>` format `enable-usb-camera`: `TRUE` or `FALSE`
`output-ip-address`	Output server IP address
`port`	Output server port
`output-type`	Use one of the following output types: `waylandsink`: To display the output on Wayland `filesink`: To store the output in a file `rtspink`: To stream the output on the server
USB camera video-format and resolution	Use one of the following video formats: `nv12` `yuy2` `mjpeg` Use one of the following resolution fields: `width`: Input USB camera source resolution width `height`: Input USB camera source resolution height `framerate`: Input USB camera source framerate
`output-file`	Output filename. The default filename is `output_detection.mp4`.

Run the app

gst-ai-object-detection

Object Detection with gst-lauch-1.0

Object detection using camera and udp sink

The download_artifacts.sh script downloads the models in /etc/models/ On board:

HOST_IP=X.X.X.X
PORT=5000

gst-launch-1.0 -e -v \
  qtiqmmfsrc camera=0 name=camsrc \
  camsrc. ! queue ! \
    'video/x-raw,format=NV12_Q08C,width=1280,height=720,framerate=30/1' ! \
    qtivcomposer name=mixer \
      sink_0::dimensions="<1280,720>" \
      sink_0::position="<0,0>" \
      sink_0::zorder=0 ! \
    queue ! \
    'video/x-raw,format=NV12,width=1280,height=720,framerate=30/1' ! \
    v4l2h264enc \
      capture-io-mode=dmabuf \
      output-io-mode=dmabuf-import ! \
    h264parse config-interval=1 ! \
    rtph264pay pt=96 config-interval=1 ! \
    udpsink \
      host=$HOST_IP \
      port=$PORT \
      sync=false \
      async=false \
  camsrc. ! queue ! \
    'video/x-raw,format=NV12,width=640,height=360,framerate=30/1' ! \
    qtimlvconverter ! \
    qtimltflite \
      model=/etc/models/yolox_quantized.tflite \
      delegate=external \
      external-delegate-path=libQnnTFLiteDelegate.so \
      external-delegate-options="QNNExternalDelegate,backend_type=htp" ! \
    qtimlpostprocess \
      module=yolov8 \
      labels=/etc/labels/yolox.json \
      results=10 \
      settings='{"confidence": 40.0}' ! \
    'video/x-raw,format=BGRA,width=640,height=360' ! \
    queue ! \
    mixer.

On Host PC:

PORT=5000
gst-launch-1.0 -v   udpsrc port=$PORT caps='application/x-rtp,media=video,encoding-name=H264,payload=96,clock-rate=90000' !   rtph264depay !   h264parse !   avdec_h264 !   videoconvert !   autovideosink sync=false

Object detection using filesrc and udp sink

On board:

HOST_IP=X.X.X.X
PORT=5000

gst-launch-1.0 -e -v \
  filesrc location=/etc/media/video.mp4 ! \
    qtdemux ! \
    h264parse ! \
    decodebin ! \
    identity sync=true ! \
    queue ! \
    qtivtransform ! \
    'video/x-raw,format=NV12,width=1280,height=720,framerate=30/1' ! \
    tee name=split \
  split. ! queue ! \
    qtivcomposer name=mixer \
      sink_0::dimensions="<1280,720>" \
      sink_0::position="<0,0>" \
      sink_0::zorder=0 ! \
    queue ! \
    'video/x-raw,format=NV12,width=1280,height=720,framerate=30/1' ! \
    v4l2h264enc \
      capture-io-mode=dmabuf \
      output-io-mode=dmabuf-import ! \
    h264parse config-interval=1 ! \
    rtph264pay pt=96 config-interval=1 ! \
    udpsink \
      host=$HOST_IP \
      port=$PORT \
      sync=false \
      async=false \
  split. ! queue ! \
    qtivtransform ! \
    'video/x-raw,format=NV12,width=640,height=360,framerate=30/1' ! \
    qtimlvconverter ! \
    qtimltflite \
      model=/etc/models/yolox_quantized.tflite \
      delegate=external \
      external-delegate-path=libQnnTFLiteDelegate.so \
      external-delegate-options="QNNExternalDelegate,backend_type=htp" ! \
    qtimlpostprocess \
      module=yolov8 \
      labels=/etc/labels/yolox.json \
      results=10 \
      settings='{"confidence": 40.0}' ! \
    'video/x-raw,format=BGRA,width=640,height=360' ! \
    queue ! \
    mixer.

On Host PC:

PORT=5000
gst-launch-1.0 -v   udpsrc port=$PORT caps='application/x-rtp,media=video,encoding-name=H264,payload=96,clock-rate=90000' !   rtph264depay !   h264parse !   avdec_h264 !   videoconvert !   autovideosink sync=false

Object detection using camera and filesink

OUT=ai_object_detection_video.mp4

gst-launch-1.0 -e -v \
  qtiqmmfsrc camera=1 name=camsrc \
  camsrc. ! queue ! \
    'video/x-raw,format=NV12_Q08C,width=1280,height=720,framerate=30/1' ! \
    qtivcomposer name=mixer \
      sink_0::dimensions="<1280,720>" \
      sink_0::position="<0,0>" \
      sink_0::zorder=0 ! \
    queue ! \
    'video/x-raw,format=NV12,width=1280,height=720,framerate=30/1' ! \
    v4l2h264enc \
      capture-io-mode=dmabuf \
      output-io-mode=dmabuf-import ! \
    h264parse config-interval=1 ! \
    mp4mux ! \
    filesink location=$OUT \
  camsrc. ! queue ! \
    'video/x-raw,format=NV12,width=640,height=360,framerate=30/1' ! \
    qtimlvconverter ! \
    qtimltflite \
      model=/etc/models/yolox_quantized.tflite \
      delegate=external \
      external-delegate-path=libQnnTFLiteDelegate.so \
      external-delegate-options="QNNExternalDelegate,backend_type=htp" ! \
    qtimlpostprocess \
      module=yolov8 \
      labels=/etc/labels/yolox.json \
      results=10 \
      settings='{"confidence": 40.0}' ! \
    'video/x-raw,format=BGRA,width=640,height=360' ! \
    queue ! \
    mixer.

Checking NPU use

You can DEBUG gstreamer pipeline further with:

export GST_DEBUG="qtimltflite:6,qtimlvconverter:4,qtimlpostprocess:4,*qnn*:6,*Qnn*:6"
export TFLITE_MINIMAL_LOG_LEVEL=0
export ADSP_LIBRARY_PATH="/usr/lib/rfsa/adsp:/usr/lib:/usr/lib/aarch64-linux-gnu"

To make sure the NPU is being used as backend, run a pipeline with different backends: HTP, CPU and GPU

Set the runtime paths

export LD_LIBRARY_PATH=/usr/lib:$LD_LIBRARY_PATH
export ADSP_LIBRARY_PATH=/usr/lib/rfsa/adsp:/usr/lib/rfsa/adsp/hexagon-v73:/usr/lib

HTP/NPU benchmark

Run this pipeline:

MODEL=/etc/models/yolox_quantized.tflite
LABELS=/etc/labels/yolox.json
DELEGATE=/usr/lib/libQnnTFLiteDelegate.so

GST_DEBUG=2 gst-launch-1.0 -e -v \
  qtiqmmfsrc camera=1 ! \
  'video/x-raw,format=NV12,width=1280,height=720,framerate=30/1' ! \
  qtimlvconverter ! \
  qtimltflite \
    model=$MODEL \
    delegate=external \
    external-delegate-path=$DELEGATE \
    external-delegate-options="QNNExternalDelegate,backend_type=htp" ! \
  qtimlpostprocess \
    module=yolov8 \
    labels=$LABELS \
    results=10 \
    settings='{"confidence": 40.0}' ! \
  fpsdisplaysink video-sink=fakesink text-overlay=false sync=false

After 10 seconds got this results:

...
/GstPipeline:pipeline0/GstFPSDisplaySink:fpsdisplaysink0: last-message = rendered: 234, dropped: 0, current: 30.18, average: 30.03
/GstPipeline:pipeline0/GstFPSDisplaySink:fpsdisplaysink0: last-message = rendered: 249, dropped: 0, current: 29.76, average: 30.01
/GstPipeline:pipeline0/GstFPSDisplaySink:fpsdisplaysink0: last-message = rendered: 265, dropped: 0, current: 30.17, average: 30.02
/GstPipeline:pipeline0/GstFPSDisplaySink:fpsdisplaysink0: last-message = rendered: 280, dropped: 0, current: 29.96, average: 30.02
/GstPipeline:pipeline0/GstFPSDisplaySink:fpsdisplaysink0: last-message = rendered: 295, dropped: 0, current: 29.91, average: 30.02

CPU using top

 PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
2800 ubuntu    20   0 2210588 281000 137748 S   7.9   0.8   0:03.47 gst-launch-1.0

Information when adding perf element before fpsdisplaysink:

perf: perf0; timestamp: 0:04:01.269363368; bps: 73728000.000; mean_bps: 73728000.000; fps: 29.967; mean_fps: 29.739; cpu: 13;

Information when using watch -n 1 cat /sys/class/kgsl/kgsl-3d0/gpubusy:

4610 1000644

CPU baseline benchmark

Run this pipeline:

MODEL=/etc/models/yolox_quantized.tflite
LABELS=/etc/labels/yolox.json

GST_DEBUG=2 gst-launch-1.0 -e -v \
  qtiqmmfsrc camera=1 ! \
  'video/x-raw,format=NV12,width=1280,height=720,framerate=30/1' ! \
  qtimlvconverter ! \
  qtimltflite \
    model=$MODEL \
    delegate=none \
    threads=4 ! \
  qtimlpostprocess \
    module=yolov8 \
    labels=$LABELS \
    results=10 \
    settings='{"confidence": 40.0}' ! \
  fpsdisplaysink video-sink=fakesink text-overlay=false sync=false

This results in the following log after 10 seconds:

...
/GstPipeline:pipeline0/GstFPSDisplaySink:fpsdisplaysink0: last-message = rendered: 25, dropped: 0, current: 3.11, average: 3.23
/GstPipeline:pipeline0/GstFPSDisplaySink:fpsdisplaysink0: last-message = rendered: 27, dropped: 0, current: 3.10, average: 3.22
/GstPipeline:pipeline0/GstFPSDisplaySink:fpsdisplaysink0: last-message = rendered: 29, dropped: 0, current: 3.11, average: 3.21
/GstPipeline:pipeline0/GstFPSDisplaySink:fpsdisplaysink0: last-message = rendered: 31, dropped: 0, current: 3.11, average: 3.20
/GstPipeline:pipeline0/GstFPSDisplaySink:fpsdisplaysink0: last-message = rendered: 33, dropped: 0, current: 3.11, average: 3.20

CPU using top

 PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
4986 ubuntu    20   0  784476 151296  96640 S  98.3   0.4   0:06.89 gst-launch-1.0
Information when adding perf element before fpsdisplaysink:

perf: perf0; timestamp: 0:12:15.400971224; bps: 7372800.000; mean_bps: 8192000.000; fps: 3.157; mean_fps: 3.158; cpu: 15;

Information when using watch -n 1 cat /sys/class/kgsl/kgsl-3d0/gpubusy:

0       0

GPU baseline benchmark

Run this pipeline

gst-launch-1.0 -e -v   qtiqmmfsrc camera=1 !   'video/x-raw,format=NV12,width=1280,height=720,framerate=30/1' !   qtimlvconverter !   qtimltflite     model=$MODEL     delegate=gpu !   qtimlpostprocess     module=yolov8     labels=$LABELS     results=10     settings='{"confidence": 40.0}' !   fpsdisplaysink video-sink=fakesink text-overlay=false sync=false

Got this result after 10 seconds:

...
/GstPipeline:pipeline0/GstFPSDisplaySink:fpsdisplaysink0: last-message = rendered: 177, dropped: 0, current: 20.01, average: 20.43
/GstPipeline:pipeline0/GstFPSDisplaySink:fpsdisplaysink0: last-message = rendered: 188, dropped: 0, current: 20.41, average: 20.43
/GstPipeline:pipeline0/GstFPSDisplaySink:fpsdisplaysink0: last-message = rendered: 199, dropped: 0, current: 20.55, average: 20.44
/GstPipeline:pipeline0/GstFPSDisplaySink:fpsdisplaysink0: last-message = rendered: 210, dropped: 0, current: 20.63, average: 20.45
/GstPipeline:pipeline0/GstFPSDisplaySink:fpsdisplaysink0: last-message = rendered: 221, dropped: 0, current: 20.22, average: 20.44

CPU using top

 PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
6122 ubuntu    20   0 1000168 241008 163632 S  20.9   0.7   0:04.69 gst-launch-1.0

Information when adding perf element before fpsdisplaysink:

perf: perf0; timestamp: 0:10:20.605190113; bps: 51609600.000; mean_bps: 49971200.000; fps: 20.358; mean_fps: 20.372; cpu: 13;

Information when using watch -n 1 cat /sys/class/kgsl/kgsl-3d0/gpubusy:

874231 1007485

Summary of results

Tested on: Ubuntu 24.04

Resolution	Backend	CPU use (%)	GPU use (%)	Frames rendered after 10 s	Average FPS
360p	HTP	7.9	0.38	296	30.02
	CPU	98.7	0	47	3.18
	GPU	20.9	88.86	221	20.51
720p	HTP	7.9	0.46	295	30.02
	CPU	98.3	0	33	3.20
	GPU	20.9	86.77	221	20.44
1080p	HTP	7.6	0.49	298	30.02
	CPU	98.7	0	33	3.26
	GPU	21.3	87.66	209	20.39

GPU % is calculated with the two values gathered on the /sys/class/kgsl/kgsl-3d0/gpubusy file with the formula: GPU Busy Raw / GPU Total Raw * 100

↑ ^1.0 ^1.1 ^1.2 ^1.3 Qualcomm, "IQ-9075," https://www.qualcomm.com/internet-of-things/products/iq9-series/iq-9075.
↑ ^2.0 ^2.1 Qualcomm, "IQ-9075 Documentation," https://docs.qualcomm.com/doc/80-75286-1/topic/iq-9075-hw-docs-homepage.html.
↑ ^3.0 ^3.1 Qualcomm, "Hardware overview - Qualcomm Dragonwing IQ-9075 Evaluation Kit," https://docs.qualcomm.com/doc/80-80020-261/topic/iq9-ug-hw-overview.html.
↑ ^4.0 ^4.1 ^4.2 Qualcomm, "Device description - QCS9075 Data Sheet," https://docs.qualcomm.com/doc/80-73417-1/topic/device-description.html.
↑ ^5.0 ^5.1 ^5.2 Qualcomm, "HTP - Qualcomm AI Runtime SDK," https://docs.qualcomm.com/doc/80-63442-10/topic/htp_backend.html.
↑ Qualcomm, "Supported Operations - Qualcomm AI Runtime SDK," https://docs.qualcomm.com/doc/80-63442-10/topic/SupportedOps.html.
↑ https://docs.qualcomm.com/doc/80-80021-261/topic/iq9-ug-run-sample-apps.html#procedure-ai

❯

[iq9075-product-1] 1.0 ^1.1 ^1.2 ^1.3 Qualcomm, "IQ-9075," https://www.qualcomm.com/internet-of-things/products/iq9-series/iq-9075.

[iq9075-docs-2] 2.0 ^2.1 Qualcomm, "IQ-9075 Documentation," https://docs.qualcomm.com/doc/80-75286-1/topic/iq-9075-hw-docs-homepage.html.

[iq9075-evk-hw-3] 3.0 ^3.1 Qualcomm, "Hardware overview - Qualcomm Dragonwing IQ-9075 Evaluation Kit," https://docs.qualcomm.com/doc/80-80020-261/topic/iq9-ug-hw-overview.html.

[qcs9075-datasheet-4] 4.0 ^4.1 ^4.2 Qualcomm, "Device description - QCS9075 Data Sheet," https://docs.qualcomm.com/doc/80-73417-1/topic/device-description.html.

[qairt-htp-5] 5.0 ^5.1 ^5.2 Qualcomm, "HTP - Qualcomm AI Runtime SDK," https://docs.qualcomm.com/doc/80-63442-10/topic/htp_backend.html.

[qairt-supported-ops-6] Qualcomm, "Supported Operations - Qualcomm AI Runtime SDK," https://docs.qualcomm.com/doc/80-63442-10/topic/SupportedOps.html.

[7] ttps://docs.qualcomm.com/doc/80-80021-261/topic/iq9-ug-run-sample-apps.html#procedure-ai

[1]

[2]

[3]

[4]

[5]

[6]

[7]

Share This Page

Send Email Using

Introduction

GStreamer elements for NPU use

Running AI sample applications [7]

Download and transfer AI models and labels

gst-ai-object-detection application

Object Detection with gst-lauch-1.0

Object detection using camera and udp sink

Object detection using filesrc and udp sink

Object detection using camera and filesink

Checking NPU use

HTP/NPU benchmark

CPU baseline benchmark

GPU baseline benchmark

Summary of results

Running AI sample applications ^[7]